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The present invention relates to apparatus and methods for decoding 
information that has been embedded in information signals, such as audio, video, or data 
signals. 



5 



Watermarking of information signals is a technique for the transmission of 



additional data along with the information signal. For instance, watermarking techniques can 
be used to embed copyright and copy control information into audio signals. 



(i.e. in the case of an audio signal, it is inaudible) whilst being robust to attacks to remove the 
10 watermark from the signal (e.g. removing the watermark will damage the signal). It will be 
appreciated that the robustness of a watermark will normally be a trade off against the quality 
of the signal in which the watermark is embedded. For instance, if a watermark is strongly 
embedded into an audio signal (and is thus difficult to remove) then it is likely that the 
quality of the audio signal will be reduced. 
15 In digital devices, it is typically assumed that there exists up to a 1% drift in 

sampling (clock) frequency. During transmission of the signal through an analog channel, 
this drift is normally manifested as a stretch or shrink in the time domain signal (i.e. a linear 
time scale change). A watermark embedded in the time domain (e.g. in an audio signal) will 
be affected by this time stretch or shrink as well, which can make watermark detection very 
20 difficult or even impossible. Thus, in the implementation of a robust watermarking scheme, it 
is extremely important to find solutions to such time scale modifications. 



within the signal is resolved by repeatedly running the watermark detection (including 
repeating the extraction of the watermark from the host signal) for the different possible time 
25 scales, until all the possible time scales are exhausted, or detection is achieved. Performing 
such searches over the possible time scaling ranges requires a large computational overhead, 
and is thus costly in terms of both hardware and computational time. Consequently, real time 
implementation of a watermark detector utilizing such a time scale search technique is not 



The main requirement of a watermarking scheme is that it is not observable 



In known time domain watermarking schemes, any linear time scale change 



feasible. 
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In watermarking schemes implemented within the frequency domains, it is 
common to perform the scale search by modifying the frequency domain coefficients. For 
instance, this can be achieved by carefully shrinking or stretching the frequency domain 
samples. In principle, such a frequency domain solution could be directly applied to time 
domain watermark signals. However, since the watermarks are directly embedded in the time 
domain samples, the time scale search needs to be performed in the time domain as well. 
Normally, there are only a few thousand frequency domain samples, whilst the time domain 
signals contain samples in the order of millions. Consequently, such an application of the 
frequency domain solution to time domain signals is computationally too expensive. 



It is an object of the present invention to provide a watermark decoding 
scheme for time domain watermarked signals that utilizes a time scale search that 
substantially addresses at least one of the problems of the prior art. 

In a first aspect, the present invention provides a method of compensating for 
15 a linear time scale change in a received signal, the signal being modified by a sequence of 
symbols in the time domain, the method comprising the steps of: (a) extracting an initial 
estimate of the sequence of symbols from said received signal;(b) forming an estimate of a 
correctly time scaled sequence of the symbols by interpolating the values of said initial 
estimate. 

20 Preferably, step (b) is repeated so as to provide a range of estimates 

corresponding to different time scalings. 

Preferably, said interpolation is at least one of zeroth order interpolation, linear 
interpolation, quadratic interpolation and cubic interpolation. 

Preferably, the method further comprises the step of processing each estimate 
25 as though it were the correctly time scaled sequence of the symbols, so as to determine which 
estimate is the best estimate. 

Preferably, the method further comprises the steps of correlating each of said 
estimates with a reference corresponding to said sequence of symbols; and taking the 
estimate with the maximum correlation peak as the best estimate. 
30 Preferably, said initial estimate of the sequence of symbols is stored in a 

buffer. 
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Preferably, said buffer is of total length M, the total numb&b bf sgaifcie.argbj5s7 

conducted is N„ = — (7^ - 77™ ) where 77^ tj^ correspond respectively to the minimum 
2 

and maximum likely time scale modifications of the signal. 

Preferably, said initial estimates of the sequence of symbols comprises a 
5 sequence of Nb estimates for each symbol, each of the Nb estimates corresponding to a 
different time offset of a symbol. 

Preferably, the scale search in the next detection window is adapted based on 
the information acquired during the current detection window. 

Preferably, the scale space is searched using an optimal searching algorithm. 
10 Preferably, the searching algorithm is the grid refinement algorithm. 

In another aspect, the present invention provides a computer program arranged 
to perform the method as described above. 

In further aspects, the present invention provides a record carrier comprising 
the computer program, and a method of making available for downloading the computer 
15 program. 

In another aspect, the present invention provides an apparatus arranged to 
compensate for a linear time scale change in a received signal, the signal being modified by a 
sequence of symbols in the time domain, the apparatus comprising: an extractor arranged to 
extract an initial estimate of the sequence of symbols from said received signal; and an 
20 interpolator arranged to form an estimate of a correctly time scaled sequence of the symbols 
by interpolating the values of said initial estimate. 

Preferably, the apparatus further comprises a buffer arranged to store one or 
more of said estimates. 

In another aspect, the present invention provides a decoder comprising the 
25 apparatus as described above. 

For a better understanding of the invention, and to show how embodiments of 
the same may be carried into effect, reference will now be made, by way of example, to the 
accompanying diagrammatic drawings in which: 

Figure 1 is a diagram illustrating a watermark embedding apparatus; 
30 Figure 2 shows a signal portion extraction filter H\ 

Figures 3a and 3b show respectively the typical amplitude and phase responses 
as a function of frequency of the filter H shown in Fig. 2; 
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Figure 4 shows the payload embedding and watermark conditioning stage of 
the apparatus shown in Fig. 1 ; 

Figure 5 is a diagram illustrating the details of the watermark conditioning 
apparatus He of Fig. 4, including charts of the associated signals at each stage; 
5 Figure 6a and 6b show two preferred alternative window shaping functions 

s(n) in the form of respectively a raised cosine function and a bi-phase function; 

Figures 7a and 7b show respectively the frequency spectra for a watermark 
sequence conditioned with a raised cosine and a bi-phase shaping window function; 

Figure 8 is a diagram illustrating a watermark detector in accordance with an 
1 0 embodiment of the present invention; 

Figure 9 diagrammatically shows the whitening filter H w of Fig. 8, for use in 
conjunction with a raised cosine shaping window function; 

Figure 10 diagrammatically shows the whitening filter H w of Fig. 8, for use in 
conjunction with a bi-phase window shaping function; 
15 Figure 1 1 shows details of the watermark symbol extraction and buffering 

processes in accordance with an embodiment of the present invention; 

Figure 12 illustrates a sequence in which estimates of watermark symbols are 
collected from four buffers when there is no time scale modification; 

Figures 13a and 13b illustrate the different sequences, according to an 
20 embodiment of the present invention, in which estimates of watermark symbols can be 

collected from four buffers when there is respectively a time stretch and a time shrink time 
scale modification; 

Figure 14 shows an example of an efficient scale search technique based on 
the concept of grid refinement; and 
25 Figure 15 shows a typical shape of the correlation function output from the 

correlator of the watermark detector shown in Fig. 8. 

Fig. 1 shows a block diagram of the apparatus required to perform the digital 
signal processing for embedding a multi-bit payload watermark w into a host signal x. 
30 A host signal x is provided at an input 12 of the apparatus. The host signal x is 

passed in the direction of output 14 via the adder 22. However, a replica of the host signal x 
(input 8) is split off in the direction of the multiplier 18, for carrying the watermark 
information. 
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The watermark signal w c is obtained from the payload embedder and 
watermark conditioning apparatus 6, and derived from a reference finite length random 
sequence w s input to the payload embedder and watermark conditioning apparatus. The 
multiplier 1 8 is utilized to calculate the product of the watermark signal w c and the replica 
5 audio signal x. The resulting product, WcX is then passed via a gain controller 24 to the adder 
22. The gain controller 24 is used to amplify or attenuate the signal by a gain factor a. 

The gain factor a controls the trade off between the audibility and the 
robustness of the watermark. It may be a constant, or variable in at least one of time, 
frequency and space. The apparatus in Fig. 1 shows that, when a is variable, it can be 
10 automatically adapted via a signal analyzing unit 26 based upon the properties of the host 

signal x. Preferably, the gain a is automatically adapted, so as to minimize the impact on the 
signal quality, according to a properly chosen perceptibility cost-function, such as a psycho- 
acoustic model of the human auditory system (HAS) in case of an audio signal. Such a model 
is, for instance, described in the paper by E.Zwicker, "Audio Engineering and 
15 Psychoacoustics: Matching signals to the final receiver, the Human Auditory System", 
Journal of the Audio Engineering Society, Vol. 39, pp. Vol.115-126, March 1991. 

In the following, an audio watermark is utilized, by way of example only, to 
describe this embodiment of the present invention. 

The resulting watermark audio signal^ is then obtained at the output 14 of the 
20 embedding apparatus 1 0 by adding an appropriately scaled version of the product of w c and x 
to the host signal: 

y[n] = x[n] + aw c [n\x[n\ . 0) 

Preferably, the watermark w c is chosen such that when multiplied with x, it 
predominantly modifies the short time envelope of x. 
25 Fig. 2 shows one preferred embodiment in which the input 8 to the multiplier 

1 8 in Fig. 1 is obtained by filtering a replica of the host signal x using a filter H in the 
filtering unit 15. If the filter output is denoted by x b , then according to this preferred 
embodiment, the watermark signal is generated by adding the product of x b and the 
watermark w c to the host signal x: 

y[n] = x + aw c [n]x b [n] . (2) 
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Let x b be defined such that x b = x - x b , and yt> be defined such that 
y = yb + *b > & tn envelope modulated portion yt, of the watermarked signal y is given as 
y b [n] = (l + w € [n])x b [n] (3) 

Preferably, as shown in Fig. 3, the filter H is a linear phase band pass filter 
5 characterized by its lower cut-off frequency^ and upper cut-off frequency fy* As can be seen 
in Fig. 3b, the filter H has a linear phase response with respect to frequency /within the pass- 
band (BW). Thus, when H is a band pass filter, Xb and x b are the in-band and out-of-band 
components of the host signal respectively. For optimum performance, it is preferable that 
the signals x b and x b are in phase. This is achieved by appropriately compensating for the 
1 0 phase distortion produced by filter H. In the case of a linear phase filter, the distortion is a 
simple time delay. 

In Fig. 4, the details, of the payload embedder and watermark conditioning unit 
6 is shown. In this unit, the initial reference random sequence w s is converted into a multi-bit 
watermark signal w e . 

1 5 Firstly a finite length, preferably zero mean and uniformly distributed random 

sequence w S9 from now on also referred to as the watermark seed signal, is generated using a 
random number generator with an initial seed S. As will be appreciated later, it is preferable 
that this initial seed S is known to both the embedder and the detector, such that a copy of the 
watermark signal can be generated at the detector for comparison purposes. This results in 

20 the sequence of length 

e [-1,1], for k=0,l,2, L w -1 (4) 

It should be noted that in some applications, the seed can be transmitted to the 
detector via an alternate channel or can be derived from the received signal using some pre- 
determined protocol. 

25 Then the sequence w s is circularly shifted by the amounts dj and d 2 using the 

circularly shifting unit 30 to obtain the random sequences w d j and w d2 respectively. It will be 
appreciated that these two sequences (w dJ and w d2 ) are effectively a first sequence and a 
second sequence, with the second sequence being circularly shifted with respect to the first. 
Each sequence w di9 i = 1 ,2, is subsequently multiplied with a respective sign bit r,-, in the 

30 multiplying unit 40, where r\ = +1 or -1 . The respective values of ri and r 2 remain constant, 
and only change when the payload of the watermark is changed. Each sequence is then 
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converted into a periodic, slowly varying narrow-band signal w, of length L W T S by the 
watermark conditioning circuit 20 shown in Fig. 4. Finally, the slowly varying narrow-band 
signals w 7 and w 2 are added with a relative delay T r (where T r <T^) to give the multi-bit 
payload watermark signal w c . This is achieved by first delaying the signal w 2 by the amount 
5 T r using delaying unit 45 and subsequently by adding it to vi>/ with the adding unit 50. 

Fig. 5 shows the watermark conditioning apparatus 20 used in the payload 
embedder and watermark conditioning apparatus 6 in more detail. The watermark seed signal 
w, is input to the conditioning apparatus 20. 

For convenience, the modification of only one of the sequences w rfl - is shown 
10 in Fig. 5, but it will be appreciated that each of the sequences is modified in a similar manner, 
with the results being added to obtain the watermark signal w c . 

As shown in Fig. 5, each watermark signal sequence w di [k] 9 1=1,2 is applied to 
the input of a sample repeater 180. Chart 181 illustrates one of the sequences w di as a 
sequence of values of random numbers between +1 and -1* with the sequence being of length 
15 L w . The sample repeater repeats each value within the watermark seed signal sequence T s 
times, so as to generate a rectangular pulse train signal. T s is referred to as the watermark 
symbol period and represents the span of the watermark symbol in the audio signal. Chart 
1 83 shows the results of the signal illustrated in chart 181 once it has passed through the 
sample repeater 1 80. 

20 A window shaping function s[nj 9 such as a raised cosine window, is then 

applied to convert the rectangular pulse functions derived from w d! and w d2 into slowly 
varying watermark sequence functions wjfnj and w 2 [n] respectively. 

Chart 184 shows a typical raised cosine window shaping function, which is 
also of span T s . 

25 The generated watermark sequences wjfnj and w 2 fnj are then added up with a 

relative delay T r (where T r <TJ to give the multi-bit payload watermark signal w c fnj i.e., 

wj*] = ^[*] + vv 2 [n-r r ] (5) 

The value of T r is chosen such that the zero crossings of w; match the 
30 maximum amplitude points of w 2 and vice-versa. Thus, for a raised cosine window shaping 
function T r =Ts/2, and for a bi-phase window shaping function T r =T/4. For other window 
shaping functions, other values of T r are possible. 



10 



15 



20 



WO 03/083859 PCT/IB03/00794 

8 

As will be appreciated by the below description, during detection the 
correlation oiwjn] will generate two correlation peaks that are separated by pL ' (as can be 
seen in Fig. 15). pL ' is an estimate of the circular shift pL between w dl and w d2 , which is part 
of the payload, and is defined as 

pL^d 2 -di\mo^ L y("^ (6) 

In addition to pL 9 extra information can be encoded by changing the relative 
signs of the embedded watermarks. 

In the detector, this is seen as a relative sign r sign between the correlation 
peaks. It may be defined as: 

r si6n = 2 A 7 2+3 e {0,1,2.3} (7) 

where pj=si&n(cLi) and pf=sign(cLz) are respectively estimates of the sign bits n (input 80) 
and r 2 (input 90) of Fig. 4, and cLj and cL 2 are the values of the correlation peak 
corresponding to w<u and w d2 respectively. The overall watermark payload pL W9 for an error- 
free detection, is then given as a combination of r sign and pL: 

pL w =(r s ^pL). (8) 

The maximum information {I m ax)> in number of bits, that can be carried by a 
watermark sequence of length L w is thus given by: 

/ ma x=log 2 (4.[ I ^/]]bits (9) 



In such a scheme, the payload is immune to relative offset between the 
embedder and the detector, and also to possible time scale modifications. 

The window shaping function has been identified as one of the main 
parameters that controls the robustness and audibility behavior of the present watermarking 
25 scheme. As illustrated in Figs. 6a and b, two examples of possible window shaping functions 
are herein described - a raised cosine function and a bi-phase function. 



WO 03/083859 PCT/IB03/00794 

9 

It is preferable to use a bi-phase window function instead of a raised cosine 
window function, so as to obtain a quasi DC-free watermark signal. This is illustrated in 
Figs. 7a and 7b, showing the frequency spectra corresponding to a watermark sequence (in 
this case a sequence ofw di [k] = {1,1,-1,1,-1,-1,}) conditioned with respectively a raised 
5 cosine and a bi-phase window shaping function. As can be seen, the frequency spectrum for 
the raised cosine conditioned watermark sequence has a maximum at frequency/ = 0, whilst 
the frequency spectrum for the bi-phase shaped watermark sequence has a minimum at/= 0 
i.e. it has very little DC component. 

Useful information is only contained in the non-Z)C component of the 
10 watermark. Consequently, for the same added watermark energy, a watermark conditioned 
with the bi-phase window will carry more useful information than one conditioned by the 
raised cosine window. As a result, the bi-phase window offers superior audibility 
performance for the same robustness or, conversely, it allows a better robustness for the same 
audibility quality. 

1 5 Such a bi-phase function could be utilized as a window shaping function for 

other watermarking schemes. In other words, a bi-phase function could be applied to reduce 
the DC component of signals (such as a watermark) that are to be incorporated into another 
signal. 

Fig. 8 shows a block diagram of a watermark detector (200, 300, 400). The 
20 detector consists of three major stages: (a) the watermark symbol extraction stage (200), (b) 
the buffering and interpolation stage (300), and (c) the correlation and decision stage (400). 

In the symbol extraction stage (200), the received watermarked signal y'[n] is 
processed to generate multiple {Ni>) estimates of the watermarked sequence. These estimates 
of the watermark sequence are required to resolve time offset that may exist between the 
25 embedder and the detector, so that the watermark detector can synchronize to the watermark 
sequence inserted in the host signal. 

In the buffering and interpolation stage (300), these estimates are de- 
multiplexed into Nb separate buffers, and an interpolation is applied to each buffer to resolve 
time scale modifications that may have occurred, e.g. a drift in sampling (clock) frequency 
30 may have resulted in a stretch or shrink in the time domain signal (i.e. the watermark may 
have been stretched or shrunk). 

In the correlation and decision stage (400), the content of each buffer is 
correlated with the reference watermark and the maximum correlation peaks are compared 
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against a threshold to determine the likelihood of whether the watermark is indeed embedded 
within the received signal y'[n]. 

In order to maximize the accuracy of the watermark detection, the watermark 
detection process is typically carried out over a length of received signal y '[n] that is 3 to 4 

5 times that of the watermark sequence length. Thus each watermark symbol to be detected can 
be constructed by taking the average of several estimates of said symbol. This averaging 
process is referred to as smoothing, and the number of times the averaging is done is referred 
to as the smoothing factor 5/ Let L D be the detection window length, defined as the length of 
the audio segment (in number of samples) over which a watermark detection truth-value is 

10 reported. Then, L D =sjL w Ts, where T s is the symbol period and the number of symbols 

within the watermark sequence. During symbol extraction, a factor T s decimation takes place 
in the energy computation stage. Thus, the length {L^ of each buffer 320 within the buffering 
and interpolation stage is Lb=s/L w . 

In the watermark symbol extraction stage 200 shown in Fig. 8, the incoming 

15 watermark signal y 9 [n] is input to the optional signal conditioning filter H b (210). This filter 
210 is typically a band pass filter and has the same behavior as the corresponding filter (//, 
15) shown in Fig. 2. The output of the filter is y'tfnj and, assuming linearity within the 
transmission medium, it follows from equations (1) and (3): 



y\ [n] « yd"] - (1 + ™{n])x b [n] (10) 



20 



Note that in the above expression, the possible time offset between the 
embedder and the detector is implicitly ignored. For ease of explanation of the general 
watermarking scheme principles, from now on, it is assumed that there is perfect 
synchronism between the embedder and the detector (i.e. no offset). Explanation is given 
25 however below in reference to Fig. 1 1 of how to compensate for time offset in accordance 
with the present invention. 

Note that when no filter is used in the embedder (i.e., when H=l) then Hb in 
the detector can also be omitted, or it can still be included to improve the detection 
performance. If H b is omitted, then y b in equation (10) is replaced with y. The rest of the 
30 processing is the same. 

We assume that the audio signal is divided into frames of length T Si and that 
y\ m [n] is the n-th sample of the m-th filtered frame signal. The energy EfmJ corresponding 
to the m-th frame is thus: 
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Combining this with equation 10, it follows that: 
(l^o^ e [m])x bm [n 

n=0 ' n=0 



E[m]^\y b Jnf «£|a + w>])^< (12) 



where w^my is the m-tf* extracted watermark symbol and contains N b time-multiplexed 
estimates of the embedded watermark sequences. Solving for w e [m] in equation 12 and 
ignoring higher order terms of a, gives the following approximation: 

2 \ 



r i 1 
m> [ml « 



r,-i 
o 



2 

J 



(13) 



In the watermark extraction stage 200 shown in Fig. 8, the output^ 9 b [n] of the 
filter Hb is provided as an input to a frame divider 220, which divides the audio signal into 

1 0 frames of length T s i.e. into y \ m [n], with the energy calculating unit 230 then being used to 
calculate the energy corresponding to each of the framed signals as per equation (12). The 
output of this energy calculation unit 230 is then provided as an input to the whitening stage 
H w (240) which performs the function shown in equation 13 so as to provide an output w e [m]. 
Alternative implementations (240 A, 240B) of this whitening stage are illustrated in Figs. 9 

15 and 10. 

It will be realized that the denominator of equation 13 contains a term that 
requires knowledge of the host (original) signal x. As the signal x is not available to the 
detector, it means that in order to calculate w e [mj then the denominator of equation 13 must 
be estimated. 

20 Below is described how such an estimation can be achieved for the two 

described window shaping functions (the raised cosine window shaping function and the bi- 
phase window shaping function), but it will equally be appreciated that the teaching could be 
extended to other window shaping functions. 

In relation to the raised cosine window shaping function shown in Fig. 6(a), it 

25 has been realized that the audio envelope induced by the watermark contributes only to the 
noisy part of the energy function EfmJ. The slowly varying part (i.e. the low frequency 
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component) is predominately due to the contribution of the envelope of the original audio 
signal x. Thus, equation 13 may be approximated by: 

f E\m\ \ 
k lowpass(E[in]) ) 



r i 1 

2a 



(14) 



where "lowpass(.)" is a low pass filter function. Thus, it will be appreciated that the 
5 whitening filter H w for the raised cosine window shape in the function can be realized as 
shown in Fig. 9. 

As can be seen, such a whitening filter H w (240A) comprises an input 242A 
for receiving the signal EfmJ. A portion of this signal is then passed through the low pass 
filter 247A to produce a low pass filtered energy signal Eu>[rn] y which in turn is provided as 

10 an input to the calculation stage 248 A along with the function EfmJ. The calculation stage 
248A then divides EfmJ by Eu>[mJ to calculate the extracted watermark symbol w e fmj. 

When a bi-phase window function is employed in the watermark conditioning 
stage of the embedder, a different approach should be utilized to estimate the envelope of the 
original audio, and hence to calculate w e fmj. 

15 It will be seen by examination of the bi-phase window function shown in Fig. 

6b, that when the audio envelope is modulated with such a window function, the first and the 
second halves of the frame are scaled in opposite directions. In the detector, this property is 
utilized to estimate the envelope energy of the host signal x. 

Consequently, within the detector, each audio frame is first sub-divided into 

20 two halves. The energy functions corresponding to the first and second half-frames are hence 
given by 

71=0 

and 

E 2 [m}= ^K-Ml 2 (16) 

n~T t lZ 



25 



respectively. As the envelope of the original audio is modulated in opposite directions within 
the two sub-frames, the original audio envelope can be approximated as the mean ofEjfmJ 
and EifmJ. 
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Further, the instantaneous modulation value can be taken as the difference 
between these two functions. Thus, for the bi-phase window function, the watermark w e fmj 
can be approximated by: 



,j.].-Lf 4M-*W -i 



(17) 



Consequently, the whitening filter H w (240B) in Fig. 8 for a bi-phase window 
shaping function can be realized as shown in Fig. 10. Inputs 242B and 243B respectively 
receive the energy functions of the first and second half frames Ei[m] and E 2 fmJ. Each 
energy function is then split up into two, and provided to adders 245B and 246B which 

10 respectively calculate EjfmJ - E 2 [m] 9 and E } [m] + E 2 [mJ. Both of these calculated functions 
are then passed to the calculating unit 248B which divides the value from adder 245B by the 
value from 246B so as to calculate w e [/w/, containing N b time-multiplexed estimates of the 
embedded watermark sequences, in accordance with equation 17. 

This output Wefmj is then passed to the buffering and interpolation stage 300 

1 5 (Fig. 8), where the signal is de-multiplexed by a de-multiplexer 310, buffered in buffers 320 
of length Lb 9 so as to resolve a lack of synchronism between the embedder and the detector, 
and interpolated within the interpolation unit 330 so as to compensate for a time scale 
modification between the embedder and the detector. 

In order to maximize the possible robustness of a watermark, it is important to 

20 make sure that the watermarking system is immune to both time offsets and drifts in 

sampling frequency between the embedder and the detector. In other words, the watermark 
detector must be able to synchronize to the watermark sequence inserted in the host signal. 

Fig. 1 1 illustrates the process carried out by the buffering and interpolation 
stage 300 to resolve the offset issue. The example described illustrates the process for 

25 resolving offset when a raised cosine window shaping function has been employed in the 

watermark embedding process. However, in principle the same technique is applicable when 
the bi-phase window shaping function has been used. 

Referring to Fig. 11, after filtering by the filter H b 210, the incoming audio 
signal streamy \[n] is separated into preferably overlapping frames 302 of effective length T s 

30 by the frame divider 220. 
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Preferably, to resolve possible offset between the embedder and the detector, 
each frame is divided into Nb sub-frames (304a, 304b,...,304x), and the above computations 
(equations (12) to (17)) are applied on a sub-frame basis. 

Preferably, each sub-frame overlaps with an adjacent sub-frame. In the 
5 example shown, it can be seen that there is a 50% overlap (Ts/Nb) of each sub-frame (304a, 
304b, . . ., 304x), with each of the sub-frames being of length 27yAfc. When overlapping sub- 
frames are considered, the main frames are preferably longer than the symbol period T 5 so as 
to allow inter-frame overlap as shown in Fig. 1 1. 

The energy of the audio is then computed for each sub-frame by the whitening 
1 0 stage 240, and the resulting values are de-multiplexed into the Nb buffers 320 by the de- 
multiplexer 310. Each one(5y f B 2 , Bm>) of the buffers 320 will thus contain a sequence of 
values, with the first buffer Bj containing a sequence of values corresponding to the first sub- 
frame within each frame, the second buffer Bj containing a sequence of values corresponding 
to the second sub-frame within each frame etc. 
15 If w&i is the content of the i-th buffer, then it can be shown that: 

™ D Xk] = w e [k-N b +i\ke{0 9 ...X~l} OS) 

where Lb is the buffer length. 

For a raised cosine window shaping function, the energy of the embedded 
20 watermark is concentrated near the center of the frame, such that the sub-frame best aligned 
with the center of the frame will result in a distinctly better estimate of the embedded 
watermark symbol than all the other sub-frames. Effectively, each buffer thus contains an 
estimate of the symbol sequence, the estimates corresponding to the sequences having 
different time offsets. 

25 The sub-frame best aligned with the center of the frame (i.e. the best estimate 

of the correctly aligned frame) is determined by correlating the contents of each buffer with 
the reference watermark sequence. The sequence with the maximum correlation peak value is 
chosen as the best estimate of the correctly aligned frame. The corresponding confidence 
level, as described below, is used to determine the truth-value of the detection. Preferably, the 

30 correlation process is halted once an estimated watermark sequence with a correlation peak 
above the defined threshold has been found. 
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Typically, the length of each buffer is between 3 to 4 times the watermark 
sequence length L*,, and is thus typically of length between 2048 and 8192 symbols, and N b is 
typically within the range of 2 to 8. 

The buffer is normally 3 to 4 times that of the watermark sequence so that 
5 each watermark symbol can be constructed by taking the averages of several estimates of said 
symbol. This averaging process is referred to as smoothing, and the number of times the 
averaging is done is referred to as the smoothing factor s f . Thus, given the buffer length L b 
and the watermark sequence length L W9 the smoothing factor s f is such that: 

10 In another preferred embodiment, the detector refines* the parameters used in 

the offset search based upon the results of a previous search step. For instance, if a first series 
of estimates shows that the results stored in buffer B 3 provide the best estimate of the 
information signal, then the next offset search (either on the same received signal, or on the 
signal received during the next detection window) is refined by shifting the position of the 

1 5 sub-frames towards the position of the best estimate sub-frame. The estimates of the 
sequence having zero offset can thus be iteratively improved. 

As previously mentioned, there can exist a drift in sampling (clock) frequency 
in digital devices, which results in a stretch or shrink in the time domain signal. 

For instance, consider an audio segment s of length L that is time scaled such 

20 that it's new length becomes = L(l+ tj) where tj is the time scaling factor, with -q being a 
constant such that 1+tj >0; for a time stretch tj>0, and for a time shrink tj<0. 

When the signal is not time scale modified (ij =0), Nb estimates of the 
watermark sequence are constructed by collecting the symbols stored in the N b buffers 
separately. 

25 Fig. 12 illustrates four buffers (Bl, B2, B3, B4), each buffer shown as a row of 

boxes, with each box within a row indicating a separate location within the respective buffer. 
The sequences w n , wu, w K , w w are respective estimates of the watermark sequence. In the 
example shown in Fig. 12, it is assumed that the signal is not time scale modified, and hence 
each estimate (wn, w^, wi 3 , w I4 ) represents an estimate of the watermark sequence with 

30 different time offset. 

Consequently, each estimate (that is passed to the correlator 410) is formed by 
sequentially collecting the entries from each buffer. For example, the first value in sequence 
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w„ (wn [1]) is collected from the first location of Bl, the second (w n [2]) from the second 
location of Bl etc, with the final value (w n [U]) being collected from the final location of the 
buffer. It will be appreciated that the arrows, which connect each box in a row to the 
neighboring box, show the direction in which values of the sequence estimates are collected 
5 from the buffer locations. It will also be appreciated that, whilst only eleven buffer locations 
are shown for each buffer, the size of the buffers in practice is likely to be significantly larger 
than this. For example, in the preferred embodiment, the length of each buffer is typically 
between 2048 and 8192 locations, with the number of buffers typically being between 2 and 
8. However, in order to prevent overflow of buffers during time scale search, the actual 

1 0 buffer lengths are set to (1+hmaxl) times the typical lengths specified above, where t^x is the 
expected maximum scaling factor. 

When the received signal y'[n] has been time scale modified, it is necessary to 
perform a time scale search in order to correctly estimate the watermark sequence. In the 
present invention, such a search is performed by systematically combining the extracted 

1 5 watermark sequence estimates (w c [m]), preferably by systematically combining 

(interpolating) the different estimates of the watermark sequences stored in the buffers. 

Such time scale searches can be performed by utilizing any order of 
interpolation. In the following two preferred embodiments, two orders of interpolation will be 
described - the first order (linear) interpolation and the zero order interpolation. However, it 

20 will be appreciated that this technique can be extended to higher orders of interpolation e.g. 
quadratic and cubic interpolation. 

In the first embodiment, estimates of the time scaled watermark sequence are 
provided by applying linear interpolation to the previously extracted estimates of the 
watermark sequence. 

25 To this end, it can be assumed that the intermediate values w e fkj generated by 

the symbol extraction step shown in Fig. 8 are sequentially stored in a single buffer of length 
M in place of the Nb buffers. In other words, that the buffers are multiplexed into a single 
buffer of length M=NbS/L W9 where L w and s f are as defined earlier. Let the so stretched 
sequence be represented by Wj> It can now be assumed that ^represents discrete samples of 

30 an otherwise continuous function. During time scale modification, these discrete points are 
either pushed towards each other or stretched out. This in turn is translated to re-sampling of 
the watermark function. 
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In this embodiment, re-sampling is realized via a linear interpolation 
technique. That is, given the watermark sequence w D fmJ,m=J, ...,M, an interpolated 
watermark sequence w/fmj is generated as 

w / [m] = /w 0 ( l(l + 7j)mj )+(l-^M>( + ) (20) 

5 

Where fi = f(l+Tj)ml- (J+7j)m, andf land ZVare the floor and the ceiling 
operators, respectively. After the interpolation, the watermark sequences are folded back into 
the N b buffers in a similar way to that shown in Fig. 1 1 . Let the interpolated watermark 
sequence folded into the buffer b e {0,. . .J?b-1} be denoted by w lb fkj, then it can be shown 
10 that 

w,j,[k] = iiw D ( l(N b k+bXl + V)i )+d-i"M>( l(N b k + b)(l + rj)] ). (21) 

Let for b=l, .... Nb, w Dtb fkJ be the pre-interpolation sequence stored in the 6-th buffer, and q pk 
e {]. ...s/L*} and r pk e{l, ...N b ) be defined as 



15 

and 



(MM 



N b J 



r.=(L(^^)(i + ,)j)-4 ( ^ + ;> (i+ ^> j. 



Then, it can be shown that w D { [(N b k+b)(l + 7j)] ) = w D,m [<lbk ] ■ 
Putting this into equation (21), it follows that 

W/^M - M»>D,to [qbk [*]] + (1 " M>D^x)[q b k + 1] (22) 

20 

Thus, the interpolated buffer entries can be calculated directly from the Nb sequences w^b, 
b=J...,N b (as shown in Fig. 8, being passed to the correlator 410), by solving equation (22). 
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A further embodiment of the present invention will now be described, in 
which estimates of the time scaled watermark sequence are provided by applying zero order 
interpolation to the previously extracted estimates of the watermark sequence. This approach 
can be represented with equation (22) with n= 1. In this case, the interpolation function can 
5 be written as 

where q pk e {1, ...s/L w } and r pk ...N b } are as defined above. 

A graphical interpretation of equation (23) is shown in Figs. 13a & b. Fig. 13a 

10 shows how the different estimates of the correct watermark sequences (w n , w^, w 0 , w M ) are 
extracted from the buffers for a time stretch, whilst Fig. 13b shows similar information for a 
time shrink. As in Fig. 12, each row of boxes represents a respective buffer, with each box 
representing a location within each buffer. The arrows indicate the order in which the buffer 
contents are collected from the estimates of the watermark sequences. 

1 5 When the audio signal is time scale modified, the start and the end of the 

framing will gradually drift backward or forward, depending respectively upon whether the 
signal is time scale stretched or compressed. The watermark symbol combining stage 
according to this embodiment tracks the size of the drift. When the absolute value of the 
cumulative drift exceeds Ts/N b (where N b is the number of buffers i.e. the number of 

20 consecutive symbols that represent a single watermark symbol), then the symbol collection 
sequence from the buffers is adjusted to provide the next best estimate of the symbol from the 
buffers. In other words, the buffer counters are incremented or decremented (depending on 
drift direction), and a circular rotation of the buffer pointer for each watermark sequence 
estimation (w>//, w n , wn> wn) is performed. 

25 Let k be the buffer entry counter, where k is an integer representing each 

location within each buffer i.e. k=l represents the first location within each buffer, k=2 the 
second etc. If the estimates of the watermark sequence are being taken from the buffers with 
no time scale modification (as shown in Fig. 12), then it will be appreciated that the values in 
the first sequence can be represented by w/jfk]. 

30 However, for time scaled estimates, assuming that an estimate tj is being made 

of the time scale, then when It^I « — , where n is any integer (and in this example N b =4), 
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the counter values and the buffers from which the watermark estimates are taken are 
changed. 

If 7j is positive (time stretch), the counter for the first buffer is incremented. 
The ordering of the buffers is also circularly shifted (i.e. the watermark sequence estimate w n 
5 previously being taken from buffer one will now been taken from buffer four, the estimate 
from buffer two will now be taken from buffer one, the estimate from three will now be taken 
from buffer two, and the estimate from buffer four will now be taken from buffer three). A 
similar circular shift is also performed on the buffer counter k. This is shown 
diagrammatically in Fig. 13a. 

10 If tj is negative (time stretch), the counter for the first buffer is incremented, 

and the ordering of the buffers is circularly shifted (i.e. the watermark sequence estimate w fJ 
previously being taken from buffer one will now be taken from buffer two, the estimate from 
buffer two will now be taken from buffer three, the estimate from three will now be taken 
from buffer four, and the estimate from buffer four will now be taken from buffer one). A 

15 similar circular shift is also performed on the buffer counter k. This is shown 
diagrammatically in Fig. 13b. 

After these circular shifts and adjustment to the buffer counters have been 
performed the symbol collection to form the different estimates of the watermark sequences 
continues from left to right until \rjk\ « (n + 1l)/Ni> (i.e. the next interchange position is 

20 reached). The process of buffer order interchanging and the sequential symbol collection is 
then repeated until the end of the buffer is reached. 

Consequently, it will be appreciated that a zeroth order interpolation of the 
time scaled watermark sequence has been performed. In other words, the time scaled 
watermark sequence has been estimated by selecting those values from the original, non time 

25 scaled watermark sequence estimates that would most closely correspond to the temporal 

positions of the time scaled watermark sequence. By utilizing previously extracted estimates 
of the watermark sequence, such a technique efficiently resolves the problems of estimating 
correctly time scaled watermarks, with minimal cost in terms of computational overhead. 

Such estimates of the time scaled watermark sequence will then be passed to 

30 the correlator (410), so as to determine whether the predicted time shift y accurately 

represents the time shift of the received signal i.e. do the estimates provided to the correlator 
provide good correlation peaks. If not, then the time scale search will be repeated for a 
different estimated value i.e. a different value of rj. 
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Due to possible time scale modification, the detection truth- value (whether or 
not the signal includes a watermark) is determined only after the appropriate scale search has 
been conducted. Let A rj be the scale search step size and let us assume that we want the 
watermark to survive all the scale modifications in the interval [^min, 7m«]- The total number 
5 of visited scales is then given by 

= 7max ~ 7min (24) 

To minimize N n it is preferred to find the maximum value of Ar\ that can still allow an 
exhaustive scale search. To this end, experimental results show that the detection 
performance is not significantly affected if the time scaling does not exceed half of the 
10 inverse of the buffer length. This means that, for an exhaustive scale search, Ati should be 
such that 

A;7< ^— - 

N b s f L w 

Putting this into equation (24), it follows that it is preferable to conduct a search over 

15 

time scales in order to conduct an exhaustive scale search. Clearly, any scale search can be 
time consuming. Thus, the complexity issue and cost in computing overhead should be 
considered when choosing the watermark embedding parameters A^, s/ and L w . 

In one preferred embodiment the scale search is adapted such that information 
20 acquired during detection is utilized to plan an optimum search in the subsequent detection 
windows. For example, the scale search in the next detection window is started around the 
current optimum scale. 

An alternative embodiment illustrated in Fig. 14 provides a method for 
efficient walk through the scale space by grid refinement. The most straightforward solution 
25 is a linear search from the minimum scale towards the maximum scale by adding up an 

incremental step. Assuming correlation, and thus confidence level, does not change abruptly 
from one scale to the next, one can considerably reduce the amount of scales visited during 
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the search by reducing the space granularity. As shown in Fig. 14, the algorithm starts at 
scale zero and is repeated until a minimum granularity is reached or the watermark is 
detected (i.e., a local maximum for the confidence level is found) and/or the confidence level 
exceeds a predetermined threshold. When one has an indication where to start the scale 
search (e.g. an initial estimation from a previous detection), a random or linear search around 
this scale may suffice. 

As shown in Fig. 8, outputs (w Dh w D2> ... from *e buffering stage are 

passed to the interpolation stage and, after interpolation, the outputs (w n , w I2 , ... w/m) of this 
stage, which are needed to resolve a possible time scale modification in the watermarked 
signal, are passed to the correlation and decision stage. All of the estimates (w/j, w I2 ,.. >*W 
of the watermark corresponding to the different possible offset values are passed to the 
correlation and decision stage 400. 

The correlator 410 calculates the correlation of each estimate w/j,j=l, ...,N b 
with respect to the reference watermark sequence w c fkj. Each respective correlation output 
corresponding to each estimate is then applied to the maximum detection unit 420 which 
determines which two estimates provided the maximum correlation peak values. These 
estimates are chosen as the ones that best fit the circularly shifted versions w di and w d2 of the 
reference watermark. The correlation values for these estimated sequences are passed to the 
threshold detector and payload extractor unit 430. 

The reference watermark sequence w s used within the detector corresponds to 
(a possibly circularly shifted version of) the original watermark sequence applied to the host 
signal. For instance, if the watermark signal was calculated using a random number generator 
with seed S within the embedder, then equally the detector can calculate the same random 
number sequence using the same random number generation algorithm and the same initial 
seed S so as to determine the watermark signal. Alternatively, the watermark signal originally 
applied in the embedder and utilized by the detector as a reference could simply be any 
predetermined sequence. 

Fig. 15 shows a typical shape of a correlation function as output from the 
correlator 410. The horizontal scale shows the correlation delay (in terms of the sequence 
samples). The vertical scale on the left hand side (referred to as the confidence level cL) 
represents the value of the correlation peak normalized with respect to the standard deviation 
of the normally distributed correlation function. 

As can be seen, the typical correlation is relatively flat with respect to cL 9 and 
centered about cL = 0. However, the function contains two peaks, which are separated \>ypL 
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(see equation 6) and extend upwards to cL values that are above the detection threshold when 
a watermark is present. When the correlation peaks are negative, the above statement applies 
to their absolute values. 

A horizontal line (shown in the Fig. as being set at cL = 8.7) represents the 

5 detection threshold. The detection threshold value controls the false alarm rate. 

Two kinds of false alarms exist: The false positive rate, defined as the 
probability of detecting a watermark in non watermarked items, and the false negative rate, 
which is defined as the probability of not detecting a watermark in watermarked items. 
Generally, the requirement of the false positive alarm is more stringent than that of the false 

10 negative. The scale on the right hand side of Fig. 1 1 illustrates the probability of a false 
positive alarm p. As can be seen in the example shown, the probability of a false positive 
/?= ia n is equivalent to the threshold cL = 8.7, whilst p = Iff 83 is equivalent to cL = 20. 

After each detection interval, the detector determines whether the original 
watermark is present or whether it is not present, and on this basis outputs a "yes" or a "no" 

15 decision. If desired, to improve this decision making process, a number of detection windows 
may be considered. In such an instance, the false positive probability is a combination of the 
individual probabilities for each detection window considered, dependent upon the desired 
criteria. For instance, it could be determined that if the correlation function has two peaks 
above a threshold of cL = 7 on any two out of three detection intervals, then the watermark is 

20 deemed to be present. Such detection criteria can be altered depending upon the desired use 
of the watermark signal and to take into account factors such as the original quality of the 
host signal and how badly the signal is likely to be corrupted during normal transmission. 

The payload extractor unit 430 may subsequently be utilized to extract the 
payload (e.g. information content) from the detected watermark signal. Once the unit has 

25 estimated the two correlation peaks cLi and cL 2 that exceed the detection threshold, an 
estimate cL' of the circular shift cL (defined in equation (6)) is derived as the distance 
between the peaks . Next, the signs pj and p 2 of the correlation peaks are determined, and 
hence r sig n calculated from equation (7). The overall watermark payload may then be 
calculated using equation (8). 

30 For instance, it can be seen in Fig. 15 that pL is the relative distance between 

the two peaks. Both peaks are positive i.e. pj = +1, and p 2 = +1- From equation (7), r S jgn = 3. 
Consequently, the payload pLw = <3, pL>. 
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It will be appreciated by the skilled person that various implementations not 
specifically described would be understood as falling within the scope of the present 
invention. For instance, whilst only the functionality of the detecting apparatus has been 
described, it will be appreciated that the apparatus could be realized as a digital circuit, an 
analog circuit, a computer program, or a combination thereof 

Equally, whilst the above embodiment has been described with reference to an 
audio signal, it will be appreciated that the present invention can be applied to add 
information to other types of signal, for instance information or multimedia signals, such as 
video and data signals. 

Further, it will be appreciated that the invention can be applied to 
watermarking schemes containing only one watermarking sequence (i.e. a 1-bit scheme), or 
to watermarking schemes containing multiple watermarking sequences. Such multiple 
sequences can be simultaneously or successively embedded within the host signal. 

Within the specification it will be appreciated that the word "comprising" does 
not exclude other elements or steps, that "a" or "and" does not exclude a plurality, and that a 
single processor or other unit may fulfil the functions of several means recited in the claims. 



